Measuring Performance when Positives Are Rare: Relative Advantage versus Predictive Accuracy - A Biological Case Study
نویسندگان
چکیده
This paper presents a new method of measuring performance when positives are rare and investigates whether Chomsky like grammar representations are useful for learning accurate comprehensible predic tors of members of biological sequence families The positive only learn ing framework of the Inductive Logic Programming ILP system CPro gol is used to generate a grammar for recognising a class of proteins known as human neuropeptide precursors NPPs Performance is mea sured using both predictive accuracy and a new cost function Relative Advantage RA The RA results show that searching for NPPs by using our best NPP predictor as a lter is more than times more e cient than randomly selecting proteins for synthesis and testing them for biological activity Predictive accuracy is not a good measure of per formance for this domain because it does not discriminate well between NPP recognition models despite covering varying numbers of the rare positives all the models are awarded a similar high score by predictive accuracy because they all exclude most of the abundant negatives
منابع مشابه
Measuring Performance when Positives are Rare : Relative Advantage versus Predictive
This paper presents a new method of measuring performance when positives are rare and investigates whether Chomsky-like grammar representations are useful for learning accurate comprehensible predic-tors of members of biological sequence families. The positive-only learning framework of the Inductive Logic Programming (ILP) system CPro-gol is used to generate a grammar for recognising a class o...
متن کاملMeasuring Performance when Positives are Rare
This paper presents a new method of measuring performance when positives are rare and investigates whether Chomskylike grammar representations are useful for learning accurate comprehensible predictors of members of biological sequence families. The positive-only learning framework of the Inductive Logic Programming (ILP) system CProgol is used to generate a grammar for recognising a class of p...
متن کاملLearning Chomsky-like Grammars for Biological Sequence Families
This paper presents a new method of measur ing performance when positives are rare and investigates whether Chomsky like grammar representations are useful for learning accu rate comprehensible predictors of members of biological sequence families The positive only learning framework of the Inductive Logic Programming ILP system CProgol is used to generate a grammar for recognis ing a class of ...
متن کاملAre Grammatical Representations Useful for Learning from Biological Sequence Data? - A Case Study
This paper investigates whether Chomsky-like grammar representations are useful for learning cost-effective, comprehensible predictors of members of biological sequence families. The Inductive Logic Programming (ILP) Bayesian approach to learning from positive examples is used to generate a grammar for recognising a class of proteins known as human neuropeptide precursors (NPPs). Collectively, ...
متن کاملCredit Risk Predictive Ability of G-ZPP Model Versus V-ZPP Model
Credit risk management is becoming more and more important in recent years. When a company deals with a financial problem, it may not be able to fulfill its financial obligations, which can cause direct and indirect financial losses to shareholders, creditors, investors and other people in the community. Advanced credit risk models that are based on market value include improving credit quality...
متن کامل